Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties
نویسندگان
چکیده
Single amino acid variations (SAVs) potentially alter biological functions, including causing diseases or natural differences between individuals. Identifying the relationship between a SAV and certain disease provides the starting point for understanding the underlying mechanisms of specific associations, and can help further prevention and diagnosis of inherited disease.We propose PredSAV, a computational method that can effectively predict how likely SAVs are to be associated with disease by incorporating gradient tree boosting (GTB) algorithm and optimally selected neighborhood features. A two-step feature selection approach is used to explore the most relevant and informative neighborhood properties that contribute to the prediction of disease association of SAVs across a wide range of sequence and structural features, especially some novel structural neighborhood features. In cross-validation experiments on the benchmark dataset, PredSAV achieves promising performances with an AUC score of 0.908 and a specificity of 0.838, which are significantly better than that of the other existing methods. Furthermore, we validate the capability of our proposed method by an independent test and gain a competitive advantage as a result. PredSAV, which combines gradient tree boosting with optimally selected neighborhood features, can return reliable predictions in distinguishing between disease-associated and neutral variants. Compared with existing methods, PredSAV shows improved specificity as well as increased overall performance.
منابع مشابه
Gradient Boosting for Conditional Random Fields
Gradient Boosting for Conditional Random Fields Report Title In this paper, we present a gradient boosting algorithm for tree-shaped conditional random fields (CRF). Conditional random fields are an important class of models for accurate structured prediction, but effective design of the feature functions is a major challenge when applying CRF models to real world data. Gradient boosting, which...
متن کاملUrban Link Travel Time Prediction Based on a Gradient Boosting Method Considering Spatiotemporal Correlations
The prediction of travel times is challenging because of the sparseness of real-time traffic data and the intrinsic uncertainty of travel on congested urban road networks. We propose a new gradient–boosted regression tree method to accurately predict travel times. This model accounts for spatiotemporal correlations extracted from historical and real-time traffic data for adjacent and target lin...
متن کاملOptimization by gradient boosting
Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of simple predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these al...
متن کاملBoosting Protein Threading Accuracy
Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdep...
متن کاملAlgorithms for Collaborative Prediction
Collaborative filtering is a framework for providing recommendations of items in an e-commerce system tailored to a specific user based on information gleaned from past behavior of both that user and other users of the system. In this thesis, we focus on the problem of predicting the rating out a 5-star scale that a given user will assign to a new item based on all prior ratings in the system. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 12 شماره
صفحات -
تاریخ انتشار 2017